Abstract of " Multilabel Classification over Category Taxonomies " Multilabel Classification over Category Taxonomies Finally I Want to Specially Thank My Father
نویسندگان
چکیده
of “Multilabel Classification over Category Taxonomies” by Lijuan Cai, Ph.D., Brown University, May 2008. Multilabel classification is the task of assigning a pattern to one or more classes or categories from a pre-defined set of classes. It is a crucial tool in knowledge and content management. Standard machine learning techniques such as Support Vector Machines (SVMs) and Perceptron have been successfully applied to this task. However, many real-world classification problems involve large numbers of overlapping categories that are arranged in a hierarchy or taxonomy. This poses a challenge to learning algorithms as they ignore the class hierarchies thereby losing valuable information. In this thesis, we propose to systematically incorporate prior knowledge on category taxonomy directly into the learning architecture. We present two methods, hierarchical SVM learning and hierarchical Perceptron learning. Both methods take a ranking view of the multilabel problem by focusing on ranking category relevances. In the hierarchical SVM, the hierarchical learning problem is expressed as a joint large margin formulation that simultaneously learns the discriminant functions of each class. As the resulting optimization problem can be prohibitively large, we also present a variable selection algorithm to efficiently solve it. In the hierarchical Perceptron method, the construction of weight vectors and the update rule are made to capture the category taxonomy. Both methods can leverage kernel techniques, work with arbitrary directed acyclic graph taxonomy, and be applied to general settings where categories can be characterized by attributes. We also present an automatic approach to learn a taxonomy if one isn’t available. Our approach is adapted from the hierarchical agglomerative clustering algorithm. The learned hierarchy can then be used in existing hierarchical classification approaches. Extensive experiments demonstrate the performance advantage of our approaches. Multilabel Classification over Category Taxonomies by Lijuan Cai B. Eng., Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, 1997 M. Eng., Computer Science, Nanjing University, 2000 M. Sc., Computer Science, Brown University, 2003 A dissertation submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in the Department of Computer Science at Brown University Providence, Rhode Island May 2008 c © Copyright 2008 by Lijuan Cai This dissertation by Lijuan Cai is accepted in its present form by the Department of Computer Science as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date Thomas Hofmann, Director Recommended to the Graduate Council Date Chad Jenkins, Reader Date Gregory Shakhnarovich, Reader Approved by the Graduate Council Date Sheila Bonde Dean of the Graduate School
منابع مشابه
Multilabel Classification through Structured Output Learning - Methods and Applications
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Hongyu Su Name of the doctoral dissertation Multilabel Classification through Structured Output Learning Methods and Applications Publisher School of Science Unit Department of Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 28/2015 Field of research Information and Computer Science Manuscrip...
متن کاملA probabilistic methodology for multilabel classification
Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen to label an instance. Due to the problem complexity (the solution is one among an exponential number of alternatives), a very common solution (the binary meth...
متن کاملUsing Taxonomies for Product Recommendation
In this work we take advantage of valuable information encoded in taxonomies to improve the quality of recommender systems. We present three strategies that explore the use of taxonomies: (i) category descriptors, (ii) classification features and (iii) category filters. We provide a real-case study over the book domain, in which the recommendation target is a set of 100 news page from The New Y...
متن کاملOn Maximum Margin Hierarchical Multilabel Classification
We present work in progress towards maximum margin hierarchical classification where the objects are allowed to belong to more than one category at a time. The classification hierarchy is represented as a Markov network equipped with an exponential family defined on the edges. We present a variation of the maximum margin multilabel learning framework, suited to the hierarchical classification t...
متن کاملMultilabel associative classification categorization of MEDLINE articles into MeSH keywords.
The specific characteristic of classification of medical documents from the MEDLINE database is that each document is assigned to more than one category, which requires a system for multilabel classification. Another major challenge was to develop a scalable method capable of dealing with hundreds of thousand of documents. We proposed a novel system for automated classification of MEDLINE docum...
متن کامل